Skip to content

feat: Add presigned URL target models and cleanups#3486

Draft
cau-git wants to merge 6 commits into
mainfrom
cau/service-models-presigned-batch-prep
Draft

feat: Add presigned URL target models and cleanups#3486
cau-git wants to merge 6 commits into
mainfrom
cau/service-models-presigned-batch-prep

Conversation

@cau-git
Copy link
Copy Markdown
Member

@cau-git cau-git commented May 21, 2026

This PR updates the shared docling service datamodels to support upcoming server-managed artifact delivery, cleanly separate internal result construction from existing API response shapes, and tighten the regular convert endpoint request contract while preserving flexibility for future endpoint expansion.

  • add a new PresignedUrlTarget and corresponding artifact/result response models so conversion outputs can be represented as per-document downloadable artifact references
  • introduce DocumentResultItem plus mapping helpers to separate internal document result handling from legacy wire models such as ExportResult and ConvertDocumentResponse, without changing existing response payloads
  • add regular-endpoint-specific request models that exclude S3 sources, so the stricter regular convert contract is modeled explicitly instead of narrowing the global shared source union
  • reject ZIP URLs only for the regular HTTP source path by using a regular-only HTTP request subclass, leaving the shared HTTP source model reusable for future endpoints with different rules
  • export the new datamodel types and add focused tests covering target parsing, regular endpoint source validation, shared-vs-regular HTTP ZIP behavior, and internal-to-wire result serialization

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

- add  plus presigned artifact/result response models
- introduce  and helpers to separate internal results from wire models
- add regular-only source request models, including ZIP URL rejection and S3 exclusion
- export new datamodel types and add focused schema/serialization tests

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

DCO Check Failed

Hi @cau-git, your pull request has failed the Developer Certificate of Origin (DCO) check.

This repository supports remediation commits, so you can fix this without rewriting history — but you must follow the required message format.


🛠 Quick Fix: Add a remediation commit

Run this command:

git commit --allow-empty -s -m "DCO Remediation Commit for Christoph Auer <cau@zurich.ibm.com>

I, Christoph Auer <cau@zurich.ibm.com>, hereby add my Signed-off-by to this commit: cd1203f91fcd1f940337ece8f4fb9f0595f46abb"
git push

🔧 Advanced: Sign off each commit directly

For the latest commit:

git commit --amend --signoff
git push --force-with-lease

For multiple commits:

git rebase --signoff origin/main
git push --force-with-lease

More info: DCO check report

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 21, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 96.61017% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling/datamodel/service/responses.py 94.73% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
)


class PresignedUrlConvertDocumentResponse(BaseModel):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently used for S3Target and PutTarget response. How can this be deprecated? PresignedUrlConvertResponse is not going to replace it, since we said that an S3 target cannot list documents produced by default as it may be very very large, need pagination etc.

)


def _to_convert_document_response(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as for ExportResult, why is ConvertDocumentResponse and DocumentResultItem both needed? Appears to have 100% overlap.

If any of this is for backward-compatibility, please explain.

Comment thread docling/datamodel/service/responses.py Outdated
Comment on lines +170 to +176
def _to_export_result(item: DocumentResultItem) -> ExportResult:
return ExportResult(
content=item.document,
status=item.status,
errors=item.errors,
timings=item.timings,
)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is ExportResult needed when we have DocumentResultItem, or vice versa?
The difference seems to be only:

  • "content" instead of "document" field, same type.
  • kind present on "ExportResult".
    It makes no sense to me. If any of this is for backward-compatibility, please explain.

cau-git added 4 commits May 22, 2026 13:12
…semantics

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant